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Curve Boxplot: Generalization of Boxplot for Ensembles of Curves 
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Fig. 1. Visualization of the order statistics of 27 historic hurricane tracks originating in the Gulf of Mexico between 1920-2012 (left) 
and a curve boxplot visualization of an ensemble of 50 simulated hurricane tracks (right). 

Abstract — In simulation science, computational scientists often study the behavior of their simulations by repeated solutions with 
variations in parameters and/or boundary values or initial conditions. Through such simulation ensembles, one can try to understand 
or quantify the variability or uncertainty in a solution as a function of the various inputs or model assumptions. In response to a 
growing interest in simulation ensembles, the visualization community has developed a suite of methods for allowing users to observe 
and understand the properties of these ensembles in an efficient and effective manner. An important aspect of visualizing simulations 
is the analysis of derived features, often represented as points, surfaces, or curves. In this paper, we present a novel, nonparametric 
method for summarizing ensembles of 2D and 3D curves. We propose an extension of a method from descriptive statistics, data 
depth, to curves. We also demonstrate a set of rendering and visualization strategies for showing rank statistics of an ensemble 
of curves, which is a generalization of traditional whisker plots or boxplots to multidimensional curves. Results are presented for 
applications in neuroimaging, hurricane forecasting and fluid dynamics. 

Index Terms —Uncertainty visualization, boxplots, ensemble visualization, order statistics, data depth, nonparametric statistic, func¬ 
tional data, parametric curves 


♦ 


1 Introduction 

In many applications, scientists use mathematical or conceptual mod¬ 
els to overcome the complexity of real-world physical phenomena. 
Recent advances in computational power and the development of high 
performance computing techniques have made it feasible to run nu¬ 
merical simulations significantly faster. As a result, a simulation en¬ 
semble can be done repeatedly for large sets of different parameter 
values within time frames that are now practical. When the parame¬ 
ter space for a simulation is too large or too complex to be explored 
completely, scientists often rely on an ensemble of runs to i) explore 
the potential outcomes of the model or ii) hypothesize about the de¬ 
ficiencies in the model. Ensembles are also used to account for dif¬ 
ferent types of uncertainties caused in different stages of knowledge 
acquisition, such as deficiencies in the model, unknown parameters 
and limited numerical accuracy. 

With an increase in the complexity and dimensionality of data, vi¬ 
sualization has become an integral and essential part of data analysis, 
and it continues to facilitate knowledge discoveries in various applica- 
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tions. If designed properly, visualizations can help the user discover 
or highlight the characteristic features of the data. As the number of 
applications using ensembles grows, the need for new techniques for 
visualizing ensembles is also increasing. An ensemble visualization 
scheme needs to respect the variability between ensemble members 
and convey such variability properly to the user. In various applica¬ 
tions, domain experts are typically interested in specific derived fea¬ 
tures of the data rather than the whole simulation field. Visualization 
of feature sets requires special treatment as specific criteria need to be 
satisfied [60]. The particular choice of the feature of interest depends 
on the application domain and the questions being asked. For instance, 
isocontours are a typical derived quantity of interest for scalar fields 
whereas pathlines as parameterized curves are considered as one of the 
dominant feature sets for flow fields. 

Visualizing the uncertainty present in an ensemble requires proper 
modeling of the variability among the ensemble members. The un¬ 
certainty (or the variability) is often modeled using probability theory 
[50]. An ensemble is considered to be an empirical representation or 
sampling of the underlying unknown distribution of the data. Exten¬ 
sive studies on the statistical analysis of ensembles can be found in 
the literature, including both parametric and nonparametric statistical 
analysis methods. Using parametric methods often results in strict as¬ 
sumptions about the underlying distribution and hence can potentially 
deteriorate the representation of the intrinsic variability present in the 
ensemble. Therefore, nonparametric statistical analysis tools are of¬ 
ten more suitable to study the variability of an ensemble [45], Among 
various nonparametric methods, descriptive statistics summarizes the 
main features of an ensemble with few or no assumptions about the un¬ 
derlying probability distribution—and therefore is robust to the type 
of distribution and in the presence of outliers [8]. Furthermore, the 
concept of data depth has been proposed as a natural mathematical 
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concept by which to derive robust descriptive statistical summaries of 
an ensemble [62]. Correspondingly, the boxplot visualization adapted 
to the concept of data depth has been proposed as a simple data ex¬ 
ploratory analysis tool for various types of data including multivariate 
points [48], functional data [54] and isocontours [60]. 

In this paper, we propose a statistical analysis tool using the con¬ 
cept of data depth for studying ensembles of parameterized curves. 
We demonstrate how this method can be used to derive robust sta¬ 
tistical summaries allowing analysis of an ensemble of parameterized 
curves including particle trace pathlines. We additionally propose a 
boxplot-based visualization scheme called a curve boxplot as a simple 
data-exploratory analysis tool to provide both quantitative and quali¬ 
tative summaries of the variability present in an ensemble of parame¬ 
terized curves. To demonstrate the utility of our proposed method, we 
provide a detailed discussion of curve boxplot visualization techniques 
in comparison with various state-of-the-art visualization techniques in 
several applications, including tractography imaging, hurricane track 
prediction and fluid dynamics. Our contribution builds upon previous 
data depth analysis techniques that aim to provide robust and descrip¬ 
tive statistical representation of ensembles. 

2 Background and Related Work 

As uncertainty quantification becomes an integrated component in dif¬ 
ferent areas of science, uncertainty visualization has been advocated 
as one of the top challenges in visualization [26, 40, 31]. Our work is 
closely related to topics from ensemble visualization, flow field visual¬ 
ization, computational geometry, and visualization of uncertain vector 
fields and their feature sets. We provide a brief review of the nominal 
work from each of these topics. 

Uncertainty visualization for isosurfaces as one of the dominant fea¬ 
ture sets of scalar fields has received significant attention [15, 23], A 
class of uncertainty visualization techniques use point-based primi¬ 
tives such as fuzziness [22] or colors [15] to convey the positional 
uncertainty associated with volumetric data and surfaces. These meth¬ 
ods provide qualitative indications about the uncertainty associated 
with the data, but they fail to provide any quantitative information. 
Therefore, recent efforts have been devoted to using statistical analy¬ 
sis techniques in order to provide quantitative information about the 
uncertainty present in the data. Parametric models have been used to 
approximate Level Crossing Probabilities (LCP) [46], based on which 
the probabilistic marching cubes algorithm was proposed [47] and de¬ 
ployed for ray casting applications [43] and extended for approxima¬ 
tion of global correlation [44]. This body of work provides quanti¬ 
tative interpretation of the uncertainty with a parametric model as¬ 
sumption. Reliance on parametric models restricts the capability of 
capturing the underlying variability in the presence of outliers or gen¬ 
eral distributions. Recently, a nonparametric kernel density estimation 
method [45] has been proposed for ensemble-based isosurface visual¬ 
ization. Although the proposed approach provides more flexibility in 
estimating more general distributions than normal distribution, the ef¬ 
ficiency and effectiveness of such methods rely heavily on parameter 
tuning. 

The analysis of flow fields and their derived features plays a vi¬ 
tal role in different simulation science applications such as computa¬ 
tional fluid dynamics (CFD), medical imaging and numerical weather 
prediction [14, 49, 19]. Pathline and streamline visualization (as in¬ 
stances of curves) is one of the most important tools for studying the 
intrinsic properties of a vector field, and hence has received significant 
attention [59]. One of the main challenges in the visualization of an 
ensemble of trajectories or streamlines is the issue of visual clutter. 
This issue has been addressed by a number of techniques focusing on 
a subset of the ensemble of pathlines through preprocessing steps such 
as streamline bundling [61], view-dependent rendering [36], selective 
positioning of the seed points [37] or rendering techniques such as 
density projection method [29]. These techniques are mainly designed 
to reduce the complexity of the ensemble of pathlines while preserving 
the salient features of the underlying vector field in the absence of un¬ 
certainty. Therefore, these pathline ensemble visualization techniques 
are not well suited to characterization of the uncertainty associated 


with an ensemble of pathlines derived from an uncertain vector field. 

Visualization of flow fields and their feature sets in the presence of 
uncertainty poses new challenges in comparison to conventional flow 
field visualization. Various visualization techniques have been pro¬ 
posed to study and visualize uncertain vector fields including texture- 
based techniques [9], comparative flow visualization [32, 57] and 
ensemble visualization techniques for time varying flow fields [24]. 
These techniques mainly use analysis and visualization of various fea¬ 
ture sets of flow fields in order to study the uncertain vector field. 
Probabilistic modeling and statistical analysis have been deployed for 
visualization of various feature sets of vector fields [42]. For instance, 
extraction and visualization of vortex cores [39], sink and sources [38] 
extracted from uncertain vector fields have been studied in a proba¬ 
bilistic framework. In our work, we focus on a specific type of feature 
set from an uncertain vector field, namely an ensemble of pathlines, 
streamlines, or trajectories, all of which are heavily used in a variety of 
applications [14, 49, 51] and often times represented as parameterized 
curves. As mentioned earlier, conventional streamline visualization 
techniques do not lend themselves to proper characterization of the 
uncertainty associated with an ensemble of pathlines as parameterized 
curves. Direct ensemble visualization techniques have been proposed 
and shown to be effective in characterizing the uncertainty present 
in an ensemble of pathlines in applications such as weather forecast¬ 
ing [51], hurricane track prediction [14], and tractography [49]. Even 
though direct visualization of ensembles has been shown to be in¬ 
formative in specific applications,it fails to provide any quantitative 
measure of aggregation or dispersion between ensemble members and 
places cognitive burden on the user to interpret the variability present 
in the ensemble [14]. Similar to level-set crossing probabilities for 
modeling positional uncertainty of isosurfaces in a scalar field, the spa¬ 
tial distribution of pathlines has been studied and visualized for blood 
flow measurements [19] and tractography applications [41] as an alter¬ 
nate visualization scheme to direct ensemble visualization. This class 
of methods provides quantitative summaries of an ensemble of path¬ 
lines while the volume rendering of the probability map provides only 
a qualitative visualization of the potential position of the pathlines. 

Another class of closely related, but distinct, analysis techniques in 
the computational geometry literature aims to study curves purely as 
geometrical objects. This class of statistical analysis techniques typi¬ 
cally relies on curve similarity metrics [11, 5, 17]. These metrics en¬ 
tail registration, deformation [35], alignment [52] and, in some cases, 
clustering [20], in order to summarize geometric properties of an en¬ 
semble. These methods provide an important suite of analysis tools 
in applications such as medical imaging and statistical shape anal¬ 
ysis [28]. However, the analysis of an ensemble of curves through 
alignment [52] and deformation has two drawbacks: i) the alignment 
processes themselves are nonlinear optimizations and are often sen¬ 
sitive to parameters, initializations, and application-specific assump¬ 
tions; and ii) alignment-based methods represent variability only in¬ 
directly, through some subsequent analysis of the coordinate transfor¬ 
mations or deformations needed to align. 

In this work, we propose a more direct approach for statistical 
analysis of curves. We propose a new methodology (that comple¬ 
ments more heavyweight methods) to study and visualize the variabil¬ 
ity present in an ensemble of curves in a more general setting based 
on the generalization of the notion of data depth. Data-depth analysis 
and boxplot visualization have been studied and successfully applied 
to ensembles of various data types. In the next section, we provide a 
self-contained introduction to the notion of data depth as the building 
block for our proposed generalization of band depth. 

3 Data Depth and Its Generalizations 

Nonparametric statistical analysis methods are a branch of statistics in 
which there is no (or minimal) model assumption regarding the un¬ 
derlying distribution, which will in turn provide statistical information 
that is more faithful to the data. Many nonparametric methods have 
been designed for studying the underlying distribution that gave rise 
to the ensemble, whereas often the domain experts are interested in 
robust statistical summaries of the ensemble. Descriptive statistics, a 
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class of nonparametric statistical analysis tools, is designed to provide 
the main features of an ensemble without estimating or representing 
the underlying distribution. 

Data depth is an elegant and powerful method to derive descriptive 
statistics such as ordering and percentile information with few assump¬ 
tions about the underlying distribution. Moreover, a proper general¬ 
ization of the concept of data depth for multivariate and complicated 
data types such as functions and isocontours has been shown to be 
sensitive to complex features such as shapes [60, 54]. The feature- 
sensitivity of the notion of data depth guarantees the robustness of the 
statistical summaries in the presence of outliers and noise [8]. We will 
formally introduce the notion of data depth and boxplots as a simple 
and effective ensemble visualization technique that represents the stat¬ 
ical summaries provided by the data depth. Data depth introduction 
will provide the fundamental concepts required for the introduction of 
the generalization of band depth for multivariate curves. 

Given an ensemble of data drawn from a distribution F, data depth 
quantifies how central (or deep) is a particular sample within the cloud 
of the sampled data. The deeper samples are considered more repre¬ 
sentative of the ensemble and are assigned high depth values whereas 
samples farther away from the rest of the ensemble are considered 
to be outliers and are correspondingly assigned lower depth values. 
Therefore, the notion of data depth provides a center outward order¬ 
ing (also known as order statistics) for an ensemble of sampled data. 

The order statistics induced by data 
depth can be used to provide robust, de¬ 
scriptive statistical information about the 
ensemble including the most representa¬ 
tive ensemble member (i.e., the median), 
percentile information and also detection 
of the potential outliers. In the univari¬ 
ate case, merely sorting the values provides 
enough information to induce the depth 
values. The univariate boxplot introduced 
by Tukey [55] is a simple data-exploratory 
analysis tool designed based on the notion 
of data depth to summarize the descriptive 
statistical quantities induced from the en¬ 
semble such as median, first and third quar- 
tile (i.e., 50% of the data), nonoutlying min¬ 
imum and maximum values (i.e., 100% of the data), and the potential 
outliers. A typical boxplot is represented in Figure 2. 

The extension of sorting to higher dimensions or more complicated 
data types such as functional data is a nontrivial task, and therefore 
proper measures of data depth must be devised based on the type of 
the data and the application. There are various generalizations for the 
notion of data depth to higher dimensions [48, 62, 30] from which the 
notion of functional hand depth is specifically designed for ensembles 
of functions [33]. The main distinction between the functional band 
depth in comparison with other generalizations of the data depth to 
higher dimensions is that functional band depth goes beyond point- 
wise analysis of functional data. Functional band depth provides a 
measure of centrality of a function among an ensemble of functional 
data that is both sensitive to the shape and the position of a function 
in comparison to the rest of ensemble members. Based the notion 
of functional band depth, functional boxplot has been proposed as an 
extension of the univariate boxplot [54]. 

3.1 Functional Band Depth 

In its simplest form, a function is defined as a ID mapping from a 
subset of the real line called the domain to another subset of the real 
line called the codomain as: 

^cl. ( 1 ) 

Pointwise statistics of ensembles of functional data will underestimate 
the correlation between values of the function over its domain and 
global features such as the shape of the function. Therefore, central¬ 
ity of a function among an ensemble of functions should be evaluated 


based on all the values in the domain. Functional band depth is an el¬ 
egant and mathematically well-defined statistical concept that defines 
a measure of centrality of a function among an ensemble of functions 
based on its graphical representation. Considering an ensemble of n 
functions, {/i (x ), fi(x ), • • • ,f n (x)}, the band depth of each ensemble 
member is defined as the probability of the inclusion of its graphical 
representation within the band formed by a random selection of j other 
functions from the ensemble. The band in this context is defined as the 
region on the plane enclosed by j functions as: 

fl(/l (*)>" •,/;(*)) = 

{{x,y):xe@,ye&, min f k (x)<y< ma xf k (x)}. ® 

k=l - J k=l,-J 

The band formed by two functions is shown in blue in Figure 3. An 



Fig. 3. Demonstration of an ensemble of five functions. The band 
formed by f 2 and / 4 is shaded in blue. Notice that f 3 is fully contained in 
the band, whereas f 5 falls inside the shaded region (i.e., the band) only 
25% of the time. 

arbitrary function g(x) lies in the band formed by j randomly selected 
functions /i(*),••• , fj(x) only if it satisfies the following property: 

g{x) c5(/i (*), • • • ,fj{x)) iff 

{Vx € ® min f k (x) < g{x) < max /*(*)}. 
k= b-J k= 1,-j 

For any fixed value of j > 2, the band depth of a function g(jt) among 
the ensemble of functions can be defined as the probability of the in¬ 
clusion of its graphical representation in random bands formed by j 
other ensemble members: 

BDj(g(x)) = ?mb[g(x) <ZB(f h (x),- ■ ■ ,f .(*))], l</i <••• ,<ij<n 

(4) 

where Prob[ ] denotes the probability and the double indices denote 
a random selection of j functions from the ensemble. The indicator 
function associated with Eq. (3) can be viewed as a binary random 
variable. Therefore, the probability Prob[ ] can be computed by the 
fraction of the bands for which Eq. (4) is satisfied. Intuitively, func¬ 
tions close to the center of the distribution have a higher likelihood 
of being contained in a random band compared to the functions far 
from the center of the distribution. The value for j and the number of 
random bands are the two parameters that affect the robustness of the 
computed data depth values. It has been shown that using different j 
values to compute the depth values as: 

J 

BDj(g(x) = £ BD t (g(x)), (5) 

j= 2 

can provide more robust depth values [54]. On the other hand, the 
number of random bands used to compute the depth values should be 
large enough to prevent degenerate cases. If the number of random 
bands formed by the ensemble members is low, many of the ensem¬ 
ble members will be assigned with similar (and potentially zero) depth 
values. One can use all possible bands formed by the ensemble mem¬ 
bers to compute the band depth values [54]. For instance, for an en¬ 
semble of size 10 and j = 2 one can form C(10,2) = 45 bands using 
any two ensemble members to compute the band depth for each of the 
ensemble members. 




median 




Fig. 2. Boxplot: a 
data exploratory visual¬ 
ization tool. 
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Fig. 4. Illustration of a functional boxplot: (a) An ensemble of 35 canonical step functions where the ensemble members are colored in random. 
The ensemble is generated by randomly perturbing the horizontal position of the jump. A single outlying example has been generated whose shape 
is significantly different from the rest of the ensemble. This ensemble member has been highlighted in red (b) A functional boxplot visualization of 
the ensemble [54]. The coloring scheme follows the 1D boxplot visualized. 


As mentioned earlier, one of the main characteristic features of the 
band depth definition is its shape and position sensitivity. Ensemble 
members with significant differences between their positions and the 
rest of the ensemble members or with too much shape variability are 
flagged as outliers (red curves in Figure 4). The purple line in Fig¬ 
ure 4(b) represents the point-wise mean of the ensemble, which ap¬ 
pears to be very different from any of the other ensemble members. 
This example illustrates that in the presence of outliers, the mean of 
the ensemble is not a good representative of the ensemble, which is an 
established concept in the statistical literature [16]. 

In the presence of noise that will translate into too much variabil¬ 
ity among the ensemble members or limited number of samples in the 
ensemble, the data depth values can be very low or close to zero sim¬ 
ply because not many ensemble members are fully contained in the 
bands. In order to overcome this challenge, a more flexible definition 
of band depth, called the modified hand depth , was proposed [33]. The 
modified band depth measures the portion of time that a function lies 
inside the band. For instance, in Figure 3 is not fully contained, but 
it falls inside the colored band 25% of the time. Modified band depth 
provides more reliable results and prevents degeneracy in the presence 
of too much variability among the ensemble members in the price of 
reducing the shape sensitivity of the method. Modified band depth will 
allow functions that are approximately (and not fully) in the band to 
be assigned with nonzero depth values (i.e., relaxation of the binary 
evaluation of Eq. (4) into a percentage). 

Motivated by the widespread use of derived features of scalar fields 
in scientific visualization, the notion of band depth has been general¬ 
ized to sets and isocontours based on which contour boxplot visual¬ 
ization has been proposed [60]. The notion of band depth can also be 
extended for ensembles of dense fields of data. A field of data can be 
represented as a surface: S : Q) & where ^Cl 2 and ^Cl. One 
can use volume-based surface band depth [54] for statistical analysis 
of ensemble of fields of data (i.e., surfaces) [21]. In the next section, 
we will define a generalization of the notion of band depth for param¬ 
eterized curves in higher dimensions. 

Statistical properties of the notion of data depth and specifically 
band depth have been widely studied in the statistics literature [30] 
where it has been shown that the sample mean of the probability as¬ 
sociated with Eq. (4) converges to its expected value as the size of the 
ensemble goes to infinity. The ensemble member with the maximum 
depth value will converge to the center of a symmetric distribution 
(i.e., the median of the distribution), whereas the depth value for an 
outlier will converge to zero. Thus, the descriptive statistics induced 
by data depth and boxplot visualizations provide a robust, interpretable 
representation of a distribution. 

4 Methods 

4.1 Generalization of Band Depth for Multivariate Curves 

Just as the upper and lower envelope of a set of scalar functions defines 
a band, multivariate functions define a band by the geometric extent of 
the points in the codomain. Thus, a natural extension of the notion of 



Fig. 5. A typical 2D parameterized curve. 



Fig. 6. The band formed by three dashed curves. The corresponding 
points along the curves forming the band have been highlighted in blue. 
The red curve is fully contained in the band based on the definition given 
in Eq. (8) whereas the green curve does not fall inside the band. 

band depth can be defined for multivariate functions and parameter¬ 
ized curves in higher dimensions (presented here, and also developed 
independently by Pintado et al. [34]), is determined by containment 
in the simplex or convex hull formed by points in the codomain. How¬ 
ever, this assumes a predefined parameterization for every curve. A 
parameterized curve can be defined in terms of an independent param¬ 
eter 5 as: 

c(s)=x(s) c \ 3> & 3><Z l.fcR 1 ' (6) 

where x € R“ is the spatial location of the curve at a specific point in 
the domain (see Figure 5). As the value of the independent parameter 
5 changes in the domain Q), c traces a curve in R d . In what follows, 
we introduce the generalization of the notion of band depth for param¬ 
eterized curves. 

Considering an ensemble of n curves in R d : {c \, • • *, c n }, where the 
correspondence between the ensemble members is established based 
on the independent parameter 5, we can define the generalized notion 
of band depth using a new definition for the band and the concept of 
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inclusion in the band. A band formed by j ensemble members in this 
context can be defined as: 

{(s,y):s€ y{s) € y{s) 

where x-i(s) denotes the point on cj(s) and A (x l (s) , • • •, x 7 (s)) denotes 
the geometrical convex hull formed by x l (s), • • • ,x^(s) in R d (see Fig¬ 
ure 6). Note that in the functional band depth, the band is defined as 
the region enclosed by the graph of j functions. Equivalently, we have 
defined the band as the region enclosed between the corresponding 
points along j curves in R^ using the notion of a convex hull. Now, 
we can introduce the notion of inclusion in the band in R d . A parame¬ 
terized curve g(s) is fully contained in the band if all the points along 
g(s) satisfy the following relation: 

g(s) C B(c 1 (*),••• ,Cj(s)) iff 

{Vi € x(s) € («),••• ,xi(s))}. 

Similar to functional band depth, the measure of centrality of an 
ensemble member is defined as the sample probability of its inclusion 
in random bands formed by a random selection of j other ensemble 
members: 

BD> (g(s)) =Prob [g(s) C B (c h (s),. . .,c,-.(s))] 1 < h < < ij <«• 

(9) 

The choice of j for the number of the ensemble members to form the 
band in R^ needs to be at least d + 1 to define proper convex hulls 
in Eq. (8). If j = d + 1, the convex hull in Eq. (8) defines a simplex 
that establishes the connection of this notion of band depth to the sim- 
plicial band depth [30]. Consequently, one can use any other notion 
of centrality defined for multivariate points to replace the convex hull 
relation in Eq. (8). For instance, one might instead use the definition 
constructed from half-spaces [55]. It is also important to note that this 
definition of band depth for parameterized curves coincides with the 
functional band depth definition when the codomain is defined as a 
subset of R. 

4.2 Boxplots for Multivariate Curves 

Here we describe the construction of boxplots for multidimensional- 
parameterized curves. In applications such as computational fluid dy¬ 
namics (CFD) or numerical weather prediction ( e.g ., hurricane track 
prediction), evolution time is a natural choice for parameterization of 
pathlines as curves, and this will uniquely determine how points are 
compared in the range (to test for containment, as above). In situa¬ 
tions where time does not provide a natural parameter space, one can 
reparameterize the ensemble of curves based on a given or application- 
specific criteria as a preprocessing step prior to data-depth analy¬ 
sis. Some of the common choices for reparameterization includes arc 
length parameterization [58] or optimization of properly designed cost 
functions [53]. The choices of parameterizations present tradeoffs. 
The time-based parameterization specifically accounts for the different 
actual speeds of the curves or pathlines (e.g. hurricane speed), whereas 
the arc-length parameterization ignores the underlying speed along a 
curve and considers only its shape. These choices are inevitably ap¬ 
plication dependent; here we will consider the different alternatives in 
the context of several different applications. 

For all applications discussed in the following section, the data 
depth analysis was carried out using the modified version of the band 
depth definition. The notion of functional band depth is stable with 
respect to different values of j [54], we also observed the same be¬ 
havior for its generalization for curves. For the 2D examples, we have 
used three points to form the convex hull (i.e., a triangle) and in the 
3D case, we have chosen to use j = 5. Smaller values for j allow for 
more shape sensitivity of the approach and are significantly faster to 
compute [33]. In all our experiments we used all the bands formed by 
subsets of the ensemble members. The depth values induced from data 
depth analysis can then be used to order (or rank) the ensemble mem¬ 
bers based on which we propose a generalization of univariate boxplot 


visualization that we call curve boxplot. In a curve boxplot visualiza¬ 
tion, the median and outlier ensemble members are rendered using the 
color convention used for the univariate boxplot demonstrated in Fig¬ 
ure 4. Multiple design choices can be used to represent the 50% and 
100% band. One can render the ensemble members falling inside the 
50% (or 100%) band with a distinct color. Individual coloring of the 
curves is desirable in situations where some topological phenomena 
such as bifurcations emerge. On the other hand, in order to provide 
an equivalent representation of the 50% band to the univariate boxplot 
and functional boxplot we used CSG (Constructive Solid Geometry) 
union operator to represent the contiguous 50% band swept by the 
50% deepest ensemble members. In order to construct a solid region 
(which is not necessarily convex), we first constructed our primitives 
as all the convex hulls formed by 2 consecutive and corresponding 
points along the 50% deepest ensemble members. This will assure 
that the region swept by these members is fully covered. Then, these 
convex hulls have been combined to a single solid region (i.e., the 50% 
band) using sequential union operations. 

We detect the outliers as the members whose depth value is smaller 
than the inflation of the range of depth values for the 50% band by a 
factor of three [56]. The outliers are shown in red in all examples. In 
the 2D examples, we also render the 100% band in a distinct color; 
however, to prevent cluttered images in the 3D examples, we have 
chosen to show the nonoutlier ensemble members beyond 50% band 
individually with a lighter shade of blue. This approach will help the 
user to better track the position of the ensemble members. 

5 Results and Applications 

In this section, we discuss experimental results to demonstrate the util¬ 
ity of the curve boxplot visualization to study the variability present in 
an ensemble of parameterized curves and pathlines in various applica¬ 
tions. We present observations on the method from users who studied 
these visualizations relative to the state of the art. Comparisons will 
be provided between alternative application-specific approaches used 
to visualize ensembles and we will comment on why boxplot visual¬ 
ization can be a proper alternative ensemble visualization scheme. For 
each of the applications, the advantages of the data depth analysis and 
boxplot visualization in satisfying domain specific criteria established 
for ensemble visualization is demonstrated. We will start the discus¬ 
sion with a canonical (synthetic) example and then present three more 
examples in prediction of hurricane tracks, medical imaging, and fluid 
dynamics. 




Fig. 7. Canonical ensemble data including 40 helical curves in 3D (a) 
Boxplot visualization, (b) Direct visualization of the ensemble members 
where the ensemble members are colored in random. 

Our synthetic example consists of an ensemble of analytical 3D 
curves. The ensemble members are generated by discrete sampling 
of a helix equation in 3D: h(s) = (jc(^),>(^), z(^)) = (cos(s),sin(s),s) 
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Fig. 8. Visualization of an ensemble of 50 simulated hurricane tracks produced using the algorithm proposed by Cox etal. [14]: (a) The error cone: 
the primary public visualization provided by National Hurricane Center (NHC) [3]. (b) Direct ensemble visualization proposed by Cox et al. [14] in 
contrast to error cone visualization. 


and randomly varying the position of the helical curve at the sampling 
positions. This example mimics the uncertainty in the position of the 
point along a helix generated from an uncertain 3D vector field where 
we considered the parameter 5 to represent time. The result of band 
depth analysis for our canonical example is shown in Figure 7. It can 
be seen that the boxplot visualization of the ensemble of 3D curves 
provides both qualitative and quantitative information about the vari¬ 
ability among ensemble members while being robust in the presence 
of the outlying members. Note that the most representative ensemble 
member {i.e., the sample median) is fully contained in the band. 

Our first demonstrative application is analyzing and visualizing an 
ensemble of predicted hurricane tracks. During the life of a hurricane, 
the National Hurricane Center [3] issues advisories every six hours. 
These advisories usually include information about the current posi¬ 
tion of the hurricane, the wind and hurricane speed, the hurricane cur¬ 
rent bearing and a prediction about its future position and its intensity. 
The primary visualization provided to the public is called the error 
cone or cone of uncertainty, whose center represents the center of the 
predicted hurricane and the width is determined based on the historical 
forecast error of the past five years [14] (see Figure 8 (a)). The cone 
represents the region enclosed by two-thirds of actual hurricanes that 
were not predicted correctly [3], modified by the experienced subjec¬ 
tive input of hurricane professionals. 

Based on the observation that the error cone usually gives the wrong 
(public) impression about the probabilistic nature of the hurricane 
track predictions, an alternative visualization approach was proposed 
by Cox et al [14]. This approach uses direct visualization of an en¬ 
semble of possible hurricane tracks generated based on the historical 
data and the current advisory information available. The ensemble of 
possible tracks is generated such that they are statistically consistent 
with the error cone. The visualization approach proposed by Cox et 
al [14] is shown in Figure 8(b). Compared to the error cone visualiza¬ 
tion, the direct visualization of the possible hurricane tracks has been 
shown to be more informative about the uncertainty and unpredictabil¬ 
ity of the predicted hurricane track by a user study [14]. However, the 
user study showed that working with direct visualizations is cogni¬ 
tively more difficult than interpreting the error cone, thus explaining 
why the NHC has been hesitant to adoption (for public consumption) a 
spaghetti plot approach. This challenge is one of the main motivations 
of using alternative visualization approaches, such as the contour box- 
plot, instead of spaghetti plots in weather forecast applications [60]. 

Figure 9 demonstrates the curve boxplot visualization of the en¬ 
semble based on the data depth analysis proposed in Section 4.1 with 
different choices of curve parameterization. Life-time parameteriza¬ 
tion reported in Figure 9(c) is an example of application-specific pa¬ 
rameterization. The hurricane-expert meteorologists are interested in 
studying the spatial variability of an ensemble of hurricane tracks over 
a specific period of time, for instance from the initiation until the land¬ 
fall. A suitable choice of parameterization for this type of analysis 


is achieved through sampling a hurricane track based on percentages 
of its total arc length. We denote this choice of parameterization as 
life-time percentage. 

As shown in the figure, various choices of parameterization affect 
the sensitivity of the band depth analysis to various types of outliers. 
While the time-parameterization is more sensitive to the velocity out¬ 
lier as a parameterization-dependent feature, the arc-length and life¬ 
time percentage parameterization are more sensitive to shape and po¬ 
sitional outliers. The hurricane tracks whose bearing pattern is very 
different from the rest of the ensemble are considered as shape outliers 
whereas the tracks whose spatial location is far from the other mem¬ 
bers are considered as positional outlier. In addition to the aggregated 
statistical quantities such as outliers and the most representative hurri¬ 
cane track (i.e. the median), the curve boxplot also provides qualitative 
visualization of the spatial extent of the possible hurricane tracks by 
showing the 50% and 100% bands. Therefore, this visualization satis¬ 
fies the criteria discussed in [14, 60]. 

The curve boxplot visualization in Figure 9 was presented to a 
group containing approximately 10 hurricane experts at the National 
Hurricane Center (NHC). We accomplished a “Qualitative Result In¬ 
spection” (QRI) study via a walk-through presentation of our tech¬ 
nique with contracts to other techniques used in the literature for the 
same tasks [25]. We then spent a day examining the procedures used 
at NHC for generating their forecasts and for generating their visu¬ 
alizations. This one-day exercise has led to further collaboration in 
which members of NHC have provided us data, from which we passed 
back visualizations for their examination and comment. This process 
is ongoing. These experts were interested in the promises the curve 
boxplot visualization provides in enhancing analysis and visualization 
of ensembles in comparison to the alternative techniques currently de¬ 
ployed. 

• These users expressed appreciation/satisfaction in the precise 
quantitative interpretation that is made available in these visu¬ 
alizations. 

• The most representative ensemble member in the boxplot visu¬ 
alization is considered to be the member with the highest depth 
value. These users agreed that choosing the most representa¬ 
tive member from the ensemble in comparison to other alterna¬ 
tive aggregation techniques, such as the track generated from the 
mean wind fields (not necessarily a member of the ensemble), 
ensures that the member that is highlighted as most represen¬ 
tative is consistent with the physical and simulation constraints. 
Alternatively, the mean track used in other visualization methods 
might not be a representative of the population or even physically 
feasible (see Figure 4 for the functional data example). 

• The cognitive load of direct ensemble visualization ( e.g . noodle 
or spaghetti plots) [14] currently prevents its deployment to the 
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Fig. 9. Curve boxplot visualization of the ensemble of hurricane tracks 
presented in Figure 8 using the generalized band depth, (a) Time pa¬ 
rameterization. (b) Arc-length parameterization, (c) Life-time percent¬ 
age parameterization. The hurricane tracks are rendered in the back¬ 
ground with the same color of the band they fall inside. 


public. On the other hand, the single aggregated quantity visual¬ 
ized in the cone of uncertainty may result in misinterpretation of 
the actual uncertainty. In comparison to these two visualization 
techniques, the curve boxplot provides various percentile-level 
information through visualization of rank statistics and bands, 
which these users felt may alleviate misconceptions about the 
meaning of the standard cone visualization. 

• Outlier visualization may not be beneficial to the public. How¬ 
ever, the visualization of outliers can be beneficial for model¬ 
ers/experts who will want to understand the full variability of 
the model and the types of data that were excluded from the full 
cone. 

• These users felt that the shape (and position for landfall) of the 
hurricane tracks was, in some instances, more important than 



(b) 


Fig. 10. 27 historic hurricane track originated in the Gulf of Mexico 

between 1920-2012 retrieved from historical hurricane track repository 
provided by NOAA [4]: (a) Direct ensemble visualization, (b) Visualiza¬ 
tion of the band depth analysis of the historic hurricane tracks using the 
arc-length parameterization, the tracks falling inside the 50% band are 
in a darker color and the tracks inside 100% band are in a lighter color. 

its speed. Thus, for certain applications they preferred the arc- 

length and percentage-lifetime parameterizations. 

This initial qualitative domain-expert study facilitates the interest in a 
subsequent evaluation of the perception of curve boxplot visualization 
for a boarder range of audience including nonexperts and potential 
deployment of the curve boxplot visualization in hurricane forecast 
workflow. 

In addition to the simulated hurricane tracks in the previous ex¬ 
ample, we also carried out the data depth analysis on an ensemble 
of 27 historic hurricane tracks that originated from Gulf of Mexico 
(a circular region of size 65 nautical miles centered at 25N by 85W) 
since 1920. The ensemble was retrieved from the historic hurricane 
track repository provided by NOAA [4]. Figure 10(a) demonstrates 
the direct ensemble visualization of the historic hurricane tracks as the 
primary visualization provided [4] and Figure 10(b) shows the visu¬ 
alization of the ensemble where hurricane tracks are colored based on 
their rank statistics induced from data depth analysis. Coloring in¬ 
dividual tracks based on their depth values is considered as a simple 
alternative visualization to boxplot visualization. The arc-length pa¬ 
rameterization used for the hurricane tracks is adopted from the previ- 
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ous example where the life cycle of the hurricanes for this example is 
96 hours. Note that median as the most representative ensemble mem¬ 
bers has a landfall near Pensacola in Florida and the hurricane tracks 
falling inside the 50% band mainly head toward west Florida which is 
consistent with hurricane-risk analysis for the Gulf Coast [2]. 



(d) 


Fig. 11. Ensemble of tracts originated from 50 seed points at the center 
of the corpus callosum using the Camino toolkit [13]. (a) Direct visual¬ 
ization of the ensemble where the tracts are colored based on FA values 
using the colormap presented. Lower FA values correspond to more 
isotropic diffusion while higher values correspond to more anisotropic or 
directional diffusion (b) Probability connection map of the ensemble of 
tracts, (c) Boxplot visualization of the tracts: the 50% band is colored 
based on FA values using the colormap presented, (d) Zoomed in view 
of the boxplot visualization along with the colormap used. 

Our third set of experimental results is motivated by applications of 
the proposed method in medical imaging. The connectivity between 
different locations inside the brain is studied in medical applications 
in order to reconstruct the neurological function of different regions of 
the brain. Tractography algorithms are among the most widely used 


methods to infer these connectivities. Tractography algorithms gen¬ 
erate particle trace pathlines in terms of parameterized curves (also 
known as tracts ) by propagating trajectories from specified seed points 
inside the brain based on direction information extracted from diffu¬ 
sion weighted magnetic resonance imaging (DW-MR). The directional 
information is based on the anisotropy of the diffusion process at each 
voxel of the image. A scalar map called the fractional anisotropy (FA) 
map is often used to specify the degree of anisotropy of a diffusion 
process at each location of the image. A value close to 0 on the FA 
map corresponds to free or equally-likely diffusion in all directions 
and value 1 specifies a directional diffusion. 

The directional information from the DW-MR imaging is prone to 
errors. As a result, there have been different efforts in using proba¬ 
bilistic approaches to follow multiple trajectories initiated in a specific 
part of the brain [18, 27, 49] to account for different sources of error 
including noise and orientation dispersion. The probabilistic tractog¬ 
raphy will result in a collection of streamlines or tracts from each seed 
location in the brain that account for the underlying uncertainty in the 
exact location of the neural fiber tracts in the brain. We have used the 
Camino diffusion MRI Toolkit [13] to generate an ensemble of prob¬ 
abilistic tracts at the center of the corpus callosum. In Figure 11, we 
have shown the boxplot visualization of the ensemble of tracts along 
with two other benchmark methods widely used to study tractography 
data. For this example, we used the curve-parameterization provided 
by Camino [13]. 

The first method is direct visualization of the tracts in which each 
point on the tract has been colored based on the corresponding value 
on the FA map (Figure 11(a)). The second method, shown in Fig¬ 
ure 11(b), is called the connection probability map. The connection 
probability map depicts the probability of the existence of a connection 
between a specific voxel to another voxel of the image, and it is widely 
used in the medical imaging community for this purpose [41, 7]. The 
probability connection map provides little information about how or 
where the tracts undergo an increase in variability (diverge). On the 
other hand, the direct visualization provides more insight about how 
the tracts diverge as they leave the seed points, but does not provide 
any quantitative information about the variability. Finally, the boxplot 
visualization of the 50% band in Figure 1 l(c-d) clearly shows that the 
tracts start as a bundle and their dispersion significantly increases as 
the FA values drop. This has been verified by shading the surface of 
the band based on the values of the FA map. We see that the 50% 
band remains narrow in the regions with high values of FA (shaded 
in red). The 50% band widens as the corresponding FA values de¬ 
crease, and expands significantly where the FA values are low (shaded 
in blue). The most representative member is fully contained in the 
band and therefore, is not visible in this particular visualization. Illus¬ 
trative confidence intervals based on distance-based confidence mea¬ 
sures has been previously proposed to study the variation persent in an 
ensemble [10]. Data depth analysis can provide robust confidence val¬ 
ues in comparison with distance-based measures specially in presence 
of different types of outliers. Moreover, the boxplot visualization pro¬ 
posed not only provides information regarding the confidence intervals 
but also present most representative and outlying members. 

Our last example is a fluid simulation application. Pathlines and 
streamlines are widely used in fluid dynamics to study the presence 
and evolution of structures such as vortices [6]. Among various param¬ 
eters involved in the formation of vortices in the fluid flow, the effect 
of Reynolds number and the boundary conditions such as inlet velocity 
are of interest. We used the 2D incompressible Navier-Stokes solver as 
part of the Nektar+-i- software package [1] to generate an ensemble of 
size 30, where the Reynolds number and the inlet velocity have been 
chosen randomly for various simulation runs. For this demonstration, 
the velocity field in 2D along with analytically calculated vorticity val¬ 
ues (i.e., orthogonal to the velocity field by definition) were used to 
generate the full 3D vector field. An ensemble of pathlines was then 
generated by placing a seed point close to the surface of the cylinder. 
The direct ensemble visualization and the boxplot visualization based 
on time parameterization are depicted in Figure 12. In this visualiza¬ 
tion we see the oscillatory nature of pathlines along the eddy line due 
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Fig. 12. An ensemble of 30 particle trace pathlines derived from the 3D vector field of a fluid flow around a cylinder. For demonstration purposes, 
the vorticity (along z) is constructed and populated from the 2D velocity field analytically, (a) Direct visualization of the ensemble, (b) Boxplot 
visualization of the ensemble. 


to the vortex behavior. We also see that the ensemble presents a co¬ 
herent set of frequencies of oscillation, with the exception of a few 
outliers that have different positions and shapes. We also see a great 
deal of variation in the depths of these pathlines, demonstrating a rel¬ 
atively greater variation in vorticity as we move down the eddy line. 

For all the results in this paper, the band depth analysis has been 
implemented in C++. The band inclusion tests were implemented 
using open source geometry libraries (YTK and CGAL). The data 
depth computations were performed on a quad core (3.20 GHz) desk¬ 
top computer. The data depth computations are independent and have 
been parallelized using the OpenMP API. However, no optimization 
has been carried out for the computation and evaluation of band in¬ 
clusion checks associated with Eq. (8). Therefore, the performance 
of the algorithm can be significantly improved up to interactive speed 
through proper parallelization of this stage and more efficient compu¬ 
tations [12]. The computation time also depends on the dimensionality 
of the data. With the current implementation, an ensemble consisting 
of 50 2D hurricane tracks with average length of 60 sample points 
along each pathline would take approximately 1 minute, whereas for 
3D cases, the computational time is dominated by the computations 
involved in Eq. (8) and might take up to 20 minutes on a desktop com¬ 
puter with specifications given above. 

6 Conclusion & Future Work 

Robust statistical analysis and visualization of an ensemble of multi¬ 
variate curves, having minimal assumptions about the underlying dis¬ 
tribution, is a challenging task - specifically in situations where both 
spatial and global geometrical features are of interest. In this paper, 
we presented a nonparametric method of deriving robust and descrip¬ 
tive statistical information from an ensemble of multivariate curves 
based on the notion of data depth. A generalization of the boxplot vi¬ 
sualization strategy in 2D and 3D has been also proposed to visualize 
the main features based on the order statistics inferred from the en¬ 
semble. Unlike many other strategies, our work is robust to outliers. 
We have demonstrated the utility of both a data depth analysis and a 
boxplot visualization for ensembles of multivariate curves in various 
applications. We have provided a comparison against other state-of- 
the-art approaches. We discussed how one can attain the desirable, 
established criteria within each application, as opposed to the other 
methodologies, to accomplish the challenging task of ensemble visu¬ 
alization. We have also provided qualitative feedback from domain 
experts in our hurricane prediction application. 

Some of the limitations of the proposed method shed light on po¬ 
tential research directions for future work. For example, the present 


method is not able to provide representative features of an ensemble 
generated by a multimodal distribution; therefore, studying and visual¬ 
izing descriptive statistics for ensembles generated from a multimodal 
distributions still require further investigation. For instance, complex 
branching structures can emerge in curve ensembles generated from 
a multimodal distribution. Defining a parameterization-invariant no¬ 
tion of band depth is still an open and challenging problem. The main 
assumption in band depth analysis for various types of data is the es¬ 
tablishment of the correspondence over the domain. Geometrically, 
the band formed by multivariate curves can be thought of as a poly¬ 
tope in higher dimensions. Therefore, simplicial tessellation of this 
region can be used to define the band. However, efficient construction 
of this region remains a challenge. 
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